Using this pattern finding playground I wrote in python, we can test different reco algorithms.
Here we're testing Sean's kmeans clustering vertex finding algorithm
\boldsymbol{\pi^+ \rightarrow e^+ + \nu_e}:
\boldsymbol{\pi^+ \rightarrow \mu^+ + \nu_\mu \rightarrow e^+ + \nu_e}
Here's some performance metric information for the \boldsymbol{\pi^+ \rightarrow \mu^+ + \nu_\mu \rightarrow e^+ + \nu_e} data set pattern finding:
Above conveys that sometimes the pattern count is correct (in this case it should always be 1 pattern), but we fail validation (validation here is just checking that every reconstructed pattern contains the same tracklets as the true patterns). This (in combination with the pattern reconsturction by particle composition plots above) shows evidence that sometimes we are missing tracklets from our patterns. My guess is somehow there are tracklets in the truth pattern that do not make hits in the ATAR, so they are not added to the reco pattern.
The above conveys that when we have "complete" information in both the "front"/(x,z) ATAR planes and the "back"/(y,z) ATAR planes, then we rarely misconstruct the event. It's only when we have incomplete information in one of the planes that a problem happens. I should note here, that only the (x,z) ATAR planes are used to determine the reconstruction here; in other words, validation is only based on the (x,z) ATAR planes.
This above plot was generated by constructing vertices from either just the (y,z) ATAR plane information or just the (x,z) ATAR plane information, and comparing how successfuly each of those approaches were. The above conveys that ~99% of the time, either the (x,z) or (y,z) planes contain enough information to correctly reconstruct that patterns. However, ~15% of the time one of the planes does not have enough information. In our "final" algorithm, we should definetly use the information from both planes in conjunction (this can be a little tricky).
The above shows a small parameter scan using the parameters "sigma" and "n_iters" of Sean's vertex finding clustering algorithm. Sigma controls the "penalty" for points being distant from a centroid while n_iters is how many iterations of moving the centroids the algorithm goes through. We'd expect performance (average validation == percent of events that were valid) to increase with n_iters, but we don't see that. Also, we expect a "sigma sweet spot" smaller than 10 but we don't see that. I don't fully trust this plot was generated correctly; i.e. there may be some bug creating false information or otherwise.
Sean did a bug fix where he prevented that clustering algorithm from adding vertices which no points are attached to. It didn't seem to change results:
Here's what I was talking about when I said it's tricky to use both the (x,z) and (y,z) ATAR plane information, by simply throwing in (x,y,z) endpoints created from fitting both the (x,z) and (y,z) ATAR planes into the clustering algorithm, we get worse performance. My guess is we get the "worst of both worlds"; i.e. when either endpoint finding algorithm fails, the pattern reconstruction seems to fail. But the last plot seems to not support that hypothesis
Here are some more \boldsymbol{\pi^+ \rightarrow e^+ + \nu_e}: plots:
Reconstruction using 3D information (both planes) instead of either (x,z) or (y,z):
I find a few things interesting: